A novel over-sampling method and its application to miRNA prediction

نویسندگان

  • Xuan Tho Dang
  • Osamu Hirose
  • Thammakorn Saethang
  • Vu Anh Tran
  • Lan Anh T. Nguyen
  • Mamoru Kubo
  • Yoichi Yamada
  • Kenji Satou
چکیده

MicroRNAs (miRNAs) are short (~22 nt) non-coding RNAs that play an indispensable role in gene regulation of many biological processes. Most of current computational, comparative, and non-comparative methods commonly classify human precursor microRNA (pre-miRNA) hairpins from both genome pseudo hairpins and other non-coding RNAs (ncRNAs). Although there were a few approaches achieving promising results in applying class imbalance learning methods, this issue has still not solved completely and successfully yet by the existing methods because of imbalanced class distribution in the datasets. For example, SMOTE is a famous and general over-sampling method addressing this problem, however in some cases it cannot improve or sometimes reduces classification performance. Therefore, we developed a novel over-sampling method named incre-mentalSMOTE to distinguish human pre-miRNA hairpins from both genome pseudo hairpins and other ncRNAs. Experimental results on pre-miRNA datasets from Batuwita et al. showed that our method achieved better Sensitivity and G-mean than the control (no oversampling), SMOTE, and several successsors of modified SMOTE including safe-level-SMOTE and border-line-SMOTE. In addition, we also applied the novel method to five imbalanced benchmark datasets from UCI Machine Learning Repository and achieved improvements in Sensitivity and G-mean. These results suggest that our method outperforms SMOTE and several successors of it in various biomedical classification problems including miRNA classification.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Plasma Level of miRNA-7, miRNA-409 and miRNA-93 as Potential Biomarkers for Colorectal Cancer

Colorectal cancer (CRC) is among deadliest cancers all over the world. Regarding its high mortality, some researches have focused to discover new applicable diagnosis methods. In this regard, circulatory miRNAs has been received huge consideration as promising biomarkers for early detection of CRC. The study aimed to evaluate the expression level of miRNA-7, miRNA-93, and miRNA-409 in plasma of...

متن کامل

Identification of miR-24 and miR-137 as novel candidate multiple sclerosis miRNA biomarkers using multi-staged data analysis protocol

Many studies have investigated misregulation of miRNAs relevant to multiple sclerosis (MS) pathogenesis. Abnormal miRNAs can be used both as candidate biomarker for MS diagnosis and understanding the disease miRNA-mRNA regulatory network. In this comprehensive study, misregulated miRNAs related to MS were collected from existing literature, databases and via in silico prediction. A multi-staged...

متن کامل

Normalization of qPCR array data: a novel method based on procrustes superimposition

MicroRNAs (miRNAs) are short, endogenous non-coding RNAs that function as guide molecules to regulate transcription of their target messenger RNAs. Several methods including low-density qPCR arrays are being increasingly used to profile the expression of these molecules in a variety of different biological conditions. Reliable analysis of expression profiles demands removal of technical variati...

متن کامل

Application of Grey System Theory in Rainfall Estimation

Considering the fact that Iran is situated in an arid and semi-arid region, rainfall prediction for the management of water resources is very important and necessary. Researchers have proposed various prediction methods that have been utilized in such areas as water and meteorology, especially water resources management. The present study aimed at predicting rainfall amounts using Grey Predicti...

متن کامل

QSAR studies and application of genetic algorithm - multiple linear regressions in prediction of novel p2x7 receptor antagonists’ activity

Quantitative structure-activity relationship (QSAR) models were employed for prediction the activity of P2X7 receptor antagonists. A data set consisted of 50 purine derivatives was utilized in the model construction where 40 and 10 of these compounds were in the training and test sets respectively. A suitable group of calculated molecular descriptors was selected by employing stepwise multiple ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013